{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "edf0f3a9-808b-4193-993a-a63700a4d238",
   "metadata": {},
   "source": [
    "## 10 minutes to pandas\n",
    "- 参考网址：https://pandas.pydata.org/docs/user_guide/10min.html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "initial_id",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2025-06-22T03:14:56.032323Z",
     "start_time": "2025-06-22T03:14:55.478982Z"
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6f536b4b-8e37-4fd8-b3dd-464796b78261",
   "metadata": {},
   "source": [
    "### 1. Basic data structures in pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de9d1aae-5ba9-4eb5-b295-1303e722af61",
   "metadata": {},
   "source": [
    "- $Series$类：表示一维带标签的数据，能够包含任何的数据结构；\n",
    "\n",
    "- $DataFrame$类：表示二维数据，具有行/列标签；"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "046d1504-1699-4d4d-baa0-58f0d044cf34",
   "metadata": {},
   "source": [
    "### 2. Object creation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "46446278-d616-4be8-b191-901d9a6aff6b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1.0\n",
       "1    3.0\n",
       "2    5.0\n",
       "3    NaN\n",
       "4    6.0\n",
       "5    8.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 不指定Index，直接通过数组来创建一维的Series\n",
    "s = pd.Series([1, 3, 5, np.nan, 6, 8])\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "ffd1398e-1f74-41a4-b010-02d063e40102",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DatetimeIndex(['2025-06-23', '2025-06-24', '2025-06-25', '2025-06-26',\n",
       "               '2025-06-27', '2025-06-28', '2025-06-29'],\n",
       "              dtype='datetime64[ns]', freq='D')"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 通过二维数组创建DataFrame结构\n",
    "dates = pd.date_range(\"20250623\", periods=7)\n",
    "dates"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "883d617c-6ed8-4477-b0fd-18ab62ace069",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2025-06-23</th>\n",
       "      <td>8</td>\n",
       "      <td>18</td>\n",
       "      <td>0</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2025-06-24</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2025-06-25</th>\n",
       "      <td>18</td>\n",
       "      <td>18</td>\n",
       "      <td>1</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2025-06-26</th>\n",
       "      <td>13</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2025-06-27</th>\n",
       "      <td>19</td>\n",
       "      <td>8</td>\n",
       "      <td>17</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2025-06-28</th>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2025-06-29</th>\n",
       "      <td>15</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             A   B   C   D\n",
       "2025-06-23   8  18   0  19\n",
       "2025-06-24   1   0   5   5\n",
       "2025-06-25  18  18   1   9\n",
       "2025-06-26  13   4   5   9\n",
       "2025-06-27  19   8  17  19\n",
       "2025-06-28   9   1  11   4\n",
       "2025-06-29  15   5   5   5"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame(np.random.randint(0, 20, (7, 4)), index=dates, columns=list('ABCD'))\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "75a46805-6263-46fa-9749-3340794fc493",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "      <th>E</th>\n",
       "      <th>F</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2013-01-02</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3</td>\n",
       "      <td>test</td>\n",
       "      <td>foo</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2013-01-02</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3</td>\n",
       "      <td>train</td>\n",
       "      <td>foo</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2013-01-02</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3</td>\n",
       "      <td>test</td>\n",
       "      <td>foo</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.0</td>\n",
       "      <td>2013-01-02</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3</td>\n",
       "      <td>train</td>\n",
       "      <td>foo</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A          B    C  D      E    F\n",
       "0  1.0 2013-01-02  1.0  3   test  foo\n",
       "1  1.0 2013-01-02  1.0  3  train  foo\n",
       "2  1.0 2013-01-02  1.0  3   test  foo\n",
       "3  1.0 2013-01-02  1.0  3  train  foo"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 通过传入字典来创建DataFrame结构\n",
    "df2 = pd.DataFrame(\n",
    "    {\n",
    "        \"A\": 1.0,\n",
    "        \"B\": pd.Timestamp(\"20130102\"),\n",
    "        \"C\": pd.Series(1, index=list(range(4)), dtype=\"float32\"),\n",
    "        \"D\": np.array([3] * 4, dtype=\"int32\"),\n",
    "        \"E\": pd.Categorical([\"test\", \"train\", \"test\", \"train\"]),\n",
    "        \"F\": \"foo\",\n",
    "    })\n",
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "0a5f4ef7-02e5-4d9b-9c0f-e679a5fba0c6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A          float64\n",
       "B    datetime64[s]\n",
       "C          float32\n",
       "D            int32\n",
       "E         category\n",
       "F           object\n",
       "dtype: object"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 返回df2每一列的数据结构\n",
    "df2.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8a31243d-8247-4251-b88b-808fc2815d5a",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "6abeb069-0404-4fa5-9c1d-bbcab7a983a1",
   "metadata": {},
   "source": [
    "### 3. Viewing data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "841f2280-0253-4fa0-a357-cee70fcf4863",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2025-06-23</th>\n",
       "      <td>8</td>\n",
       "      <td>18</td>\n",
       "      <td>0</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2025-06-24</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2025-06-25</th>\n",
       "      <td>18</td>\n",
       "      <td>18</td>\n",
       "      <td>1</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2025-06-26</th>\n",
       "      <td>13</td>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2025-06-27</th>\n",
       "      <td>19</td>\n",
       "      <td>8</td>\n",
       "      <td>17</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2025-06-28</th>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2025-06-29</th>\n",
       "      <td>15</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             A   B   C   D\n",
       "2025-06-23   8  18   0  19\n",
       "2025-06-24   1   0   5   5\n",
       "2025-06-25  18  18   1   9\n",
       "2025-06-26  13   4   5   9\n",
       "2025-06-27  19   8  17  19\n",
       "2025-06-28   9   1  11   4\n",
       "2025-06-29  15   5   5   5"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "0544d01b-1262-4810-bc3b-4344e7626876",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2025-06-23</th>\n",
       "      <td>8</td>\n",
       "      <td>18</td>\n",
       "      <td>0</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2025-06-24</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2025-06-25</th>\n",
       "      <td>18</td>\n",
       "      <td>18</td>\n",
       "      <td>1</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             A   B  C   D\n",
       "2025-06-23   8  18  0  19\n",
       "2025-06-24   1   0  5   5\n",
       "2025-06-25  18  18  1   9"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 审视DataFrame结构顶部和底部的数据\n",
    "df.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "1725a190-3e47-4802-8d5a-4ae94ab16f64",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2025-06-28</th>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "      <td>11</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2025-06-29</th>\n",
       "      <td>15</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             A  B   C  D\n",
       "2025-06-28   9  1  11  4\n",
       "2025-06-29  15  5   5  5"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.tail(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "f64042a2-cd42-471c-bcbb-d8b33112690b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DatetimeIndex(['2025-06-23', '2025-06-24', '2025-06-25', '2025-06-26',\n",
       "               '2025-06-27', '2025-06-28', '2025-06-29'],\n",
       "              dtype='datetime64[ns]', freq='D')"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 审视DataFrame结构的Index和Columns标签\n",
    "df.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "77ee0a8d-c4ed-4e25-90a4-ab756f45cd78",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['A', 'B', 'C', 'D'], dtype='object')"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "id": "4726fa59-af3e-40a1-84f0-135d83d2886e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['A', 'B', 'C', 'D']"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 将行标签或者列标签转化为list/numpy数组\n",
    "list(df.columns)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "40ff10aa-4d66-4ddc-8353-c4f2569a14a0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['A', 'B', 'C', 'D'], dtype='<U1')"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.array(df.columns).astype(str)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6d1dc874-2bac-49e0-802e-90c14ca22278",
   "metadata": {},
   "source": [
    "注意：Numpy......"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1029a955-0c06-4ba3-becd-ead1ff89de7d",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "b9b8d7c3-8a9d-4fbc-a8e2-d317bf436d7b",
   "metadata": {},
   "source": [
    "### 4. Selection"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0426bfbd-e9ab-4db6-9cdc-52fd93de9c14",
   "metadata": {},
   "source": [
    "### 5. Missing data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67a633b9-017c-4c90-8623-8dcded9416b7",
   "metadata": {},
   "source": [
    "### 6.Operations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d0498a4-3d06-448d-9979-fb524f3c4bd9",
   "metadata": {},
   "source": [
    "### 7. Merge"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b20c0905-cb5d-446d-8664-9e97dcc88aa0",
   "metadata": {},
   "source": [
    "### 8.Grouping"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de92a468-269e-44ec-adae-ae6a35fb6f35",
   "metadata": {},
   "source": [
    "### 9.Reshaping"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2bb9e567-9359-4ac5-b346-460662c37e36",
   "metadata": {},
   "source": [
    "### 10.Time series"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8f46f40e-ee01-4c5a-9872-5a61ce6eea8b",
   "metadata": {},
   "source": [
    "### 11.Categoricals"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "798dc4c1-eba3-4bb4-9d16-0c66bdc3ceb6",
   "metadata": {},
   "source": [
    "### 12.Plotting"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5bbb8186-05a7-4030-ac80-d6f710de0d87",
   "metadata": {},
   "source": [
    "### 13.Importing and exporting data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8fcaba06-9d6d-44ea-975b-c935d43304a1",
   "metadata": {},
   "source": [
    "### 14.Gitchas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d09be2f9-75b8-4788-b616-8e14969ff307",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7deee16f-df8a-4a78-a46d-16a9a68065f2",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d0fefd37-2a51-48fb-9627-7259d03ab8da",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8ee062c3-d92c-47a9-af9b-a0db1855285b",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
